Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 453 | 437 |
| Missing cells (%) | 8.5% | 8.2% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 4 | 5 |
| Categorical | 5 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High correlation |
Parch is highly imbalanced (53.6%) | Alert not present in this dataset | Imbalance |
Age has 102 (22.9%) missing values | Age has 85 (19.1%) missing values | Missing |
Cabin has 351 (78.7%) missing values | Cabin has 351 (78.7%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 312 (70.0%) zeros | SibSp has 315 (70.6%) zeros | Zeros |
Fare has 7 (1.6%) zeros | Fare has 9 (2.0%) zeros | Zeros |
| Alert not present in this dataset | Parch has 341 (76.5%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-03-24 21:58:56.250521 | 2025-03-24 21:58:57.864474 |
| Analysis finished | 2025-03-24 21:58:57.861354 | 2025-03-24 21:59:00.093510 |
| Duration | 1.61 second | 2.23 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 444.5852 | 427.79372 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 3 | 2 |
| Maximum | 891 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 3 | 2 |
| 5-th percentile | 50.25 | 48.25 |
| Q1 | 222.5 | 213.75 |
| median | 448.5 | 413.5 |
| Q3 | 654.75 | 640.75 |
| 95-th percentile | 849.75 | 841.75 |
| Maximum | 891 | 891 |
| Range | 888 | 889 |
| Interquartile range (IQR) | 432.25 | 427 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 255.16398 | 253.23946 |
| Coefficient of variation (CV) | 0.57393719 | 0.59196628 |
| Kurtosis | -1.1691995 | -1.1508941 |
| Mean | 444.5852 | 427.79372 |
| Median Absolute Deviation (MAD) | 217 | 215.5 |
| Skewness | 0.025472839 | 0.093424544 |
| Sum | 198285 | 190796 |
| Variance | 65108.657 | 64130.223 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 480 | 1 | 0.2% |
| 857 | 1 | 0.2% |
| 22 | 1 | 0.2% |
| 726 | 1 | 0.2% |
| 509 | 1 | 0.2% |
| 532 | 1 | 0.2% |
| 855 | 1 | 0.2% |
| 477 | 1 | 0.2% |
| 210 | 1 | 0.2% |
| 264 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 195 | 1 | 0.2% |
| 358 | 1 | 0.2% |
| 740 | 1 | 0.2% |
| 184 | 1 | 0.2% |
| 312 | 1 | 0.2% |
| 381 | 1 | 0.2% |
| 145 | 1 | 0.2% |
| 569 | 1 | 0.2% |
| 368 | 1 | 0.2% |
| 136 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 4 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 18 | 1 | |
| 21 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 19 | 1 | |
| 20 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 19 | 1 | |
| 20 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 4 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 18 | 1 | |
| 21 | 1 | |
| 22 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 0 |
| 2nd row | 1 | 0 |
| 3rd row | 0 | 1 |
| 4th row | 0 | 1 |
| 5th row | 0 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 2 |
| 2nd row | 2 | 3 |
| 3rd row | 3 | 2 |
| 4th row | 3 | 1 |
| 5th row | 3 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 104 | |
| 2 | 97 | 21.7% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 104 | |
| 2 | 97 | 21.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 104 | |
| 2 | 97 | 21.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 104 | |
| 2 | 97 | 21.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 104 | |
| 2 | 97 | 21.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 104 | |
| 2 | 97 | 21.7% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 56 | 67 |
| Median length | 47 | 49 |
| Mean length | 26.367713 | 26.556054 |
| Min length | 13 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Wick, Mrs. George Dennick (Mary Hitchcock) | Funk, Miss. Annie Clemmer |
| 2nd row | Beesley, Mr. Lawrence | Nankoff, Mr. Minko |
| 3rd row | Oreskovic, Mr. Luka | Becker, Master. Richard F |
| 4th row | Olsen, Mr. Henry Margido | Ryerson, Miss. Emily Borie |
| 5th row | Toufik, Mr. Nakli | Bidois, Miss. Rosalie |
| Value | Count | Frequency (%) |
| mr | 266 | 15.0% |
| miss | 95 | 5.3% |
| mrs | 47 | 2.6% |
| william | 30 | 1.7% |
| master | 24 | 1.3% |
| henry | 21 | 1.2% |
| john | 19 | 1.1% |
| james | 14 | 0.8% |
| joseph | 11 | 0.6% |
| thomas | 11 | 0.6% |
| Other values (875) | 1241 |
| Value | Count | Frequency (%) |
| mr | 266 | 14.9% |
| miss | 94 | 5.3% |
| mrs | 60 | 3.4% |
| william | 38 | 2.1% |
| john | 23 | 1.3% |
| henry | 21 | 1.2% |
| master | 19 | 1.1% |
| james | 14 | 0.8% |
| george | 13 | 0.7% |
| richard | 12 | 0.7% |
| Other values (870) | 1230 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1333 | 11.3% | |
| r | 956 | 8.1% |
| e | 831 | 7.1% |
| a | 829 | 7.0% |
| i | 664 | 5.6% |
| n | 631 | 5.4% |
| s | 630 | 5.4% |
| M | 568 | 4.8% |
| l | 502 | 4.3% |
| o | 487 | 4.1% |
| Other values (48) | 4329 |
| Value | Count | Frequency (%) |
| 1344 | 11.3% | |
| r | 971 | 8.2% |
| e | 823 | 6.9% |
| a | 808 | 6.8% |
| i | 667 | 5.6% |
| s | 660 | 5.6% |
| n | 648 | 5.5% |
| M | 562 | 4.7% |
| l | 532 | 4.5% |
| o | 505 | 4.3% |
| Other values (49) | 4324 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 11760 |
| Value | Count | Frequency (%) |
| (unknown) | 11844 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1333 | 11.3% | |
| r | 956 | 8.1% |
| e | 831 | 7.1% |
| a | 829 | 7.0% |
| i | 664 | 5.6% |
| n | 631 | 5.4% |
| s | 630 | 5.4% |
| M | 568 | 4.8% |
| l | 502 | 4.3% |
| o | 487 | 4.1% |
| Other values (48) | 4329 |
| Value | Count | Frequency (%) |
| 1344 | 11.3% | |
| r | 971 | 8.2% |
| e | 823 | 6.9% |
| a | 808 | 6.8% |
| i | 667 | 5.6% |
| s | 660 | 5.6% |
| n | 648 | 5.5% |
| M | 562 | 4.7% |
| l | 532 | 4.5% |
| o | 505 | 4.3% |
| Other values (49) | 4324 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 11760 |
| Value | Count | Frequency (%) |
| (unknown) | 11844 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1333 | 11.3% | |
| r | 956 | 8.1% |
| e | 831 | 7.1% |
| a | 829 | 7.0% |
| i | 664 | 5.6% |
| n | 631 | 5.4% |
| s | 630 | 5.4% |
| M | 568 | 4.8% |
| l | 502 | 4.3% |
| o | 487 | 4.1% |
| Other values (48) | 4329 |
| Value | Count | Frequency (%) |
| 1344 | 11.3% | |
| r | 971 | 8.2% |
| e | 823 | 6.9% |
| a | 808 | 6.8% |
| i | 667 | 5.6% |
| s | 660 | 5.6% |
| n | 648 | 5.5% |
| M | 562 | 4.7% |
| l | 532 | 4.5% |
| o | 505 | 4.3% |
| Other values (49) | 4324 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 11760 |
| Value | Count | Frequency (%) |
| (unknown) | 11844 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1333 | 11.3% | |
| r | 956 | 8.1% |
| e | 831 | 7.1% |
| a | 829 | 7.0% |
| i | 664 | 5.6% |
| n | 631 | 5.4% |
| s | 630 | 5.4% |
| M | 568 | 4.8% |
| l | 502 | 4.3% |
| o | 487 | 4.1% |
| Other values (48) | 4329 |
| Value | Count | Frequency (%) |
| 1344 | 11.3% | |
| r | 971 | 8.2% |
| e | 823 | 6.9% |
| a | 808 | 6.8% |
| i | 667 | 5.6% |
| s | 660 | 5.6% |
| n | 648 | 5.5% |
| M | 562 | 4.7% |
| l | 532 | 4.5% |
| o | 505 | 4.3% |
| Other values (49) | 4324 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.6547085 | 4.690583 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | female | female |
| 2nd row | male | male |
| 3rd row | male | male |
| 4th row | male | female |
| 5th row | male | female |
Common Values
| Value | Count | Frequency (%) |
| male | 300 | |
| female | 146 |
| Value | Count | Frequency (%) |
| male | 292 | |
| female | 154 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 300 | |
| female | 146 |
| Value | Count | Frequency (%) |
| male | 292 | |
| female | 154 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 592 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 146 | 7.0% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2076 |
| Value | Count | Frequency (%) |
| (unknown) | 2092 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 592 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 146 | 7.0% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2076 |
| Value | Count | Frequency (%) |
| (unknown) | 2092 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 592 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 146 | 7.0% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2076 |
| Value | Count | Frequency (%) |
| (unknown) | 2092 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 592 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 146 | 7.0% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 72 | 72 |
| Distinct (%) | 20.9% | 19.9% |
| Missing | 102 | 85 |
| Missing (%) | 22.9% | 19.1% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.032006 | 29.889197 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.75 |
| Maximum | 74 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.75 |
| 5-th percentile | 4 | 4 |
| Q1 | 20.75 | 21 |
| median | 28 | 29 |
| Q3 | 36.125 | 38 |
| 95-th percentile | 55.7 | 54 |
| Maximum | 74 | 80 |
| Range | 73.58 | 79.25 |
| Interquartile range (IQR) | 15.375 | 17 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.384357 | 14.089438 |
| Coefficient of variation (CV) | 0.4954655 | 0.47138899 |
| Kurtosis | 0.30545645 | 0.41501417 |
| Mean | 29.032006 | 29.889197 |
| Median Absolute Deviation (MAD) | 8 | 8 |
| Skewness | 0.36281216 | 0.33798708 |
| Sum | 9987.01 | 10790 |
| Variance | 206.90974 | 198.51227 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 30 | 18 | 4.0% |
| 24 | 16 | 3.6% |
| 28 | 13 | 2.9% |
| 25 | 13 | 2.9% |
| 19 | 12 | 2.7% |
| 29 | 11 | 2.5% |
| 22 | 11 | 2.5% |
| 27 | 11 | 2.5% |
| 21 | 11 | 2.5% |
| 35 | 10 | 2.2% |
| Other values (62) | 218 | |
| (Missing) | 102 |
| Value | Count | Frequency (%) |
| 24 | 20 | 4.5% |
| 22 | 16 | 3.6% |
| 30 | 14 | 3.1% |
| 18 | 13 | 2.9% |
| 36 | 13 | 2.9% |
| 29 | 13 | 2.9% |
| 28 | 12 | 2.7% |
| 31 | 11 | 2.5% |
| 33 | 11 | 2.5% |
| 25 | 11 | 2.5% |
| Other values (62) | 227 | |
| (Missing) | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 7 | |
| 3 | 1 | 0.2% |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 3 | |
| 7 | 3 |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 5 | |
| 3 | 4 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 3 | |
| 7 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 5 | |
| 3 | 4 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 3 | |
| 7 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 7 | |
| 3 | 1 | 0.2% |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 3 | |
| 7 | 3 |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.57174888 | 0.44843049 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 312 | 315 |
| Zeros (%) | 70.0% | 70.6% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 3 | 2 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.3078879 | 0.91959187 |
| Coefficient of variation (CV) | 2.2875217 | 2.0506899 |
| Kurtosis | 15.965491 | 15.434885 |
| Mean | 0.57174888 | 0.44843049 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.6988898 | 3.3135527 |
| Sum | 255 | 200 |
| Variance | 1.7105709 | 0.84564922 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 312 | |
| 1 | 93 | 20.9% |
| 2 | 12 | 2.7% |
| 3 | 10 | 2.2% |
| 4 | 8 | 1.8% |
| 8 | 7 | 1.6% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 315 | |
| 1 | 97 | 21.7% |
| 2 | 16 | 3.6% |
| 4 | 8 | 1.8% |
| 3 | 7 | 1.6% |
| 5 | 2 | 0.4% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 312 | |
| 1 | 93 | 20.9% |
| 2 | 12 | 2.7% |
| 3 | 10 | 2.2% |
| 4 | 8 | 1.8% |
| 5 | 4 | 0.9% |
| 8 | 7 | 1.6% |
| Value | Count | Frequency (%) |
| 0 | 315 | |
| 1 | 97 | 21.7% |
| 2 | 16 | 3.6% |
| 3 | 7 | 1.6% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 315 | |
| 1 | 97 | 21.7% |
| 2 | 16 | 3.6% |
| 3 | 7 | 1.6% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 312 | |
| 1 | 93 | 20.9% |
| 2 | 12 | 2.7% |
| 3 | 10 | 2.2% |
| 4 | 8 | 1.8% |
| 5 | 4 | 0.9% |
| 8 | 7 | 1.6% |
Parch
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 5 | 7 |
| Distinct (%) | 1.1% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 | |
| 2 | |
| 4 | 2 |
| 3 | 1 |
| 0 | |
|---|---|
| 1 | |
| 2 | |
| 5 | 5 |
| 4 | 3 |
| Other values (2) | 3 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 1 | 1 ? |
| Unique (%) | 0.2% | 0.2% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 339 | |
| 1 | 58 | 13.0% |
| 2 | 46 | 10.3% |
| 4 | 2 | 0.4% |
| 3 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 54 | 12.1% |
| 2 | 40 | 9.0% |
| 5 | 5 | 1.1% |
| 4 | 3 | 0.7% |
| 3 | 2 | 0.4% |
| 6 | 1 | 0.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 339 | |
| 1 | 58 | 13.0% |
| 2 | 46 | 10.3% |
| 4 | 2 | 0.4% |
| 3 | 1 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 339 | |
| 1 | 58 | 13.0% |
| 2 | 46 | 10.3% |
| 4 | 2 | 0.4% |
| 3 | 1 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 339 | |
| 1 | 58 | 13.0% |
| 2 | 46 | 10.3% |
| 4 | 2 | 0.4% |
| 3 | 1 | 0.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 339 | |
| 1 | 58 | 13.0% |
| 2 | 46 | 10.3% |
| 4 | 2 | 0.4% |
| 3 | 1 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 339 | |
| 1 | 58 | 13.0% |
| 2 | 46 | 10.3% |
| 4 | 2 | 0.4% |
| 3 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 377 | 381 |
| Distinct (%) | 84.5% | 85.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.9282511 | 6.8206278 |
| Min length | 3 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 327 | 333 ? |
| Unique (%) | 73.3% | 74.7% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 36928 | 237671 |
| 2nd row | 248698 | 349218 |
| 3rd row | 315094 | 230136 |
| 4th row | C 4001 | PC 17608 |
| 5th row | 2641 | PC 17757 |
| Value | Count | Frequency (%) |
| pc | 32 | 5.5% |
| ca | 12 | 2.1% |
| a/5 | 8 | 1.4% |
| c.a | 8 | 1.4% |
| ston/o | 7 | 1.2% |
| 2 | 7 | 1.2% |
| 2343 | 7 | 1.2% |
| sc/paris | 7 | 1.2% |
| soton/oq | 6 | 1.0% |
| w./c | 6 | 1.0% |
| Other values (397) | 479 |
| Value | Count | Frequency (%) |
| pc | 31 | 5.5% |
| c.a | 13 | 2.3% |
| a/5 | 8 | 1.4% |
| 2 | 7 | 1.2% |
| ston/o | 7 | 1.2% |
| 347082 | 5 | 0.9% |
| ca | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| soton/oq | 4 | 0.7% |
| 113760 | 4 | 0.7% |
| Other values (403) | 478 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 388 | |
| 1 | 333 | |
| 2 | 292 | |
| 7 | 239 | 7.7% |
| 4 | 237 | 7.7% |
| 6 | 214 | 6.9% |
| 0 | 213 | 6.9% |
| 5 | 184 | 6.0% |
| 9 | 156 | 5.0% |
| 8 | 145 | 4.7% |
| Other values (22) | 689 |
| Value | Count | Frequency (%) |
| 3 | 391 | |
| 1 | 357 | |
| 2 | 309 | |
| 7 | 238 | 7.8% |
| 4 | 232 | 7.6% |
| 0 | 212 | 7.0% |
| 6 | 209 | 6.9% |
| 5 | 189 | 6.2% |
| 9 | 160 | 5.3% |
| 8 | 140 | 4.6% |
| Other values (22) | 605 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3090 |
| Value | Count | Frequency (%) |
| (unknown) | 3042 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 388 | |
| 1 | 333 | |
| 2 | 292 | |
| 7 | 239 | 7.7% |
| 4 | 237 | 7.7% |
| 6 | 214 | 6.9% |
| 0 | 213 | 6.9% |
| 5 | 184 | 6.0% |
| 9 | 156 | 5.0% |
| 8 | 145 | 4.7% |
| Other values (22) | 689 |
| Value | Count | Frequency (%) |
| 3 | 391 | |
| 1 | 357 | |
| 2 | 309 | |
| 7 | 238 | 7.8% |
| 4 | 232 | 7.6% |
| 0 | 212 | 7.0% |
| 6 | 209 | 6.9% |
| 5 | 189 | 6.2% |
| 9 | 160 | 5.3% |
| 8 | 140 | 4.6% |
| Other values (22) | 605 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3090 |
| Value | Count | Frequency (%) |
| (unknown) | 3042 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 388 | |
| 1 | 333 | |
| 2 | 292 | |
| 7 | 239 | 7.7% |
| 4 | 237 | 7.7% |
| 6 | 214 | 6.9% |
| 0 | 213 | 6.9% |
| 5 | 184 | 6.0% |
| 9 | 156 | 5.0% |
| 8 | 145 | 4.7% |
| Other values (22) | 689 |
| Value | Count | Frequency (%) |
| 3 | 391 | |
| 1 | 357 | |
| 2 | 309 | |
| 7 | 238 | 7.8% |
| 4 | 232 | 7.6% |
| 0 | 212 | 7.0% |
| 6 | 209 | 6.9% |
| 5 | 189 | 6.2% |
| 9 | 160 | 5.3% |
| 8 | 140 | 4.6% |
| Other values (22) | 605 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3090 |
| Value | Count | Frequency (%) |
| (unknown) | 3042 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 388 | |
| 1 | 333 | |
| 2 | 292 | |
| 7 | 239 | 7.7% |
| 4 | 237 | 7.7% |
| 6 | 214 | 6.9% |
| 0 | 213 | 6.9% |
| 5 | 184 | 6.0% |
| 9 | 156 | 5.0% |
| 8 | 145 | 4.7% |
| Other values (22) | 689 |
| Value | Count | Frequency (%) |
| 3 | 391 | |
| 1 | 357 | |
| 2 | 309 | |
| 7 | 238 | 7.8% |
| 4 | 232 | 7.6% |
| 0 | 212 | 7.0% |
| 6 | 209 | 6.9% |
| 5 | 189 | 6.2% |
| 9 | 160 | 5.3% |
| 8 | 140 | 4.6% |
| Other values (22) | 605 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 170 | 174 |
| Distinct (%) | 38.1% | 39.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 33.246851 | 33.743067 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 7 | 9 |
| Zeros (%) | 1.6% | 2.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.225 |
| Q1 | 7.8958 | 7.8958 |
| median | 14.2 | 13.5 |
| Q3 | 31.20625 | 30.5 |
| 95-th percentile | 130.2375 | 133.65 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 23.31045 | 22.6042 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 52.880301 | 57.79157 |
| Coefficient of variation (CV) | 1.5905356 | 1.7126947 |
| Kurtosis | 32.867076 | 32.917728 |
| Mean | 33.246851 | 33.743067 |
| Median Absolute Deviation (MAD) | 6.95 | 6.25 |
| Skewness | 4.7997987 | 4.9797293 |
| Sum | 14828.096 | 15049.408 |
| Variance | 2796.3262 | 3339.8656 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7.75 | 24 | 5.4% |
| 13 | 22 | 4.9% |
| 8.05 | 22 | 4.9% |
| 26 | 17 | 3.8% |
| 7.8958 | 15 | 3.4% |
| 7.2292 | 11 | 2.5% |
| 7.925 | 11 | 2.5% |
| 26.55 | 8 | 1.8% |
| 10.5 | 8 | 1.8% |
| 7.225 | 7 | 1.6% |
| Other values (160) | 301 |
| Value | Count | Frequency (%) |
| 7.8958 | 26 | 5.8% |
| 8.05 | 21 | 4.7% |
| 13 | 18 | 4.0% |
| 7.75 | 16 | 3.6% |
| 26 | 16 | 3.6% |
| 10.5 | 15 | 3.4% |
| 7.925 | 12 | 2.7% |
| 0 | 9 | 2.0% |
| 7.25 | 7 | 1.6% |
| 7.225 | 7 | 1.6% |
| Other values (164) | 299 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 3 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 2 | 0.4% |
| 7.225 | 7 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 2 | 0.4% |
| 7.225 | 7 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 3 |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 78 | 79 |
| Distinct (%) | 82.1% | 83.2% |
| Missing | 351 | 351 |
| Missing (%) | 78.7% | 78.7% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 11 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.7052632 | 3.6315789 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 63 | 65 ? |
| Unique (%) | 66.3% | 68.4% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | D56 | F4 |
| 2nd row | A31 | B57 B59 B63 B66 |
| 3rd row | B94 | E67 |
| 4th row | B96 B98 | B86 |
| 5th row | C22 C26 | C101 |
| Value | Count | Frequency (%) |
| b96 | 3 | 2.7% |
| b98 | 3 | 2.7% |
| c23 | 3 | 2.7% |
| c25 | 3 | 2.7% |
| c27 | 3 | 2.7% |
| b51 | 2 | 1.8% |
| b53 | 2 | 1.8% |
| b55 | 2 | 1.8% |
| b22 | 2 | 1.8% |
| c123 | 2 | 1.8% |
| Other values (77) | 88 |
| Value | Count | Frequency (%) |
| b96 | 4 | 3.6% |
| b98 | 4 | 3.6% |
| b49 | 2 | 1.8% |
| e67 | 2 | 1.8% |
| d26 | 2 | 1.8% |
| f2 | 2 | 1.8% |
| c83 | 2 | 1.8% |
| e44 | 2 | 1.8% |
| b77 | 2 | 1.8% |
| c124 | 2 | 1.8% |
| Other values (79) | 87 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 43 | |
| C | 43 | |
| B | 31 | 8.8% |
| 1 | 30 | 8.5% |
| 5 | 29 | 8.2% |
| 3 | 24 | 6.8% |
| 7 | 19 | 5.4% |
| 18 | 5.1% | |
| 4 | 18 | 5.1% |
| 8 | 17 | 4.8% |
| Other values (9) | 80 |
| Value | Count | Frequency (%) |
| C | 35 | 10.1% |
| B | 34 | 9.9% |
| 2 | 31 | 9.0% |
| 1 | 29 | 8.4% |
| 6 | 24 | 7.0% |
| 5 | 22 | 6.4% |
| 9 | 20 | 5.8% |
| 3 | 20 | 5.8% |
| 4 | 20 | 5.8% |
| 8 | 19 | 5.5% |
| Other values (8) | 91 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 352 |
| Value | Count | Frequency (%) |
| (unknown) | 345 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2 | 43 | |
| C | 43 | |
| B | 31 | 8.8% |
| 1 | 30 | 8.5% |
| 5 | 29 | 8.2% |
| 3 | 24 | 6.8% |
| 7 | 19 | 5.4% |
| 18 | 5.1% | |
| 4 | 18 | 5.1% |
| 8 | 17 | 4.8% |
| Other values (9) | 80 |
| Value | Count | Frequency (%) |
| C | 35 | 10.1% |
| B | 34 | 9.9% |
| 2 | 31 | 9.0% |
| 1 | 29 | 8.4% |
| 6 | 24 | 7.0% |
| 5 | 22 | 6.4% |
| 9 | 20 | 5.8% |
| 3 | 20 | 5.8% |
| 4 | 20 | 5.8% |
| 8 | 19 | 5.5% |
| Other values (8) | 91 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 352 |
| Value | Count | Frequency (%) |
| (unknown) | 345 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2 | 43 | |
| C | 43 | |
| B | 31 | 8.8% |
| 1 | 30 | 8.5% |
| 5 | 29 | 8.2% |
| 3 | 24 | 6.8% |
| 7 | 19 | 5.4% |
| 18 | 5.1% | |
| 4 | 18 | 5.1% |
| 8 | 17 | 4.8% |
| Other values (9) | 80 |
| Value | Count | Frequency (%) |
| C | 35 | 10.1% |
| B | 34 | 9.9% |
| 2 | 31 | 9.0% |
| 1 | 29 | 8.4% |
| 6 | 24 | 7.0% |
| 5 | 22 | 6.4% |
| 9 | 20 | 5.8% |
| 3 | 20 | 5.8% |
| 4 | 20 | 5.8% |
| 8 | 19 | 5.5% |
| Other values (8) | 91 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 352 |
| Value | Count | Frequency (%) |
| (unknown) | 345 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2 | 43 | |
| C | 43 | |
| B | 31 | 8.8% |
| 1 | 30 | 8.5% |
| 5 | 29 | 8.2% |
| 3 | 24 | 6.8% |
| 7 | 19 | 5.4% |
| 18 | 5.1% | |
| 4 | 18 | 5.1% |
| 8 | 17 | 4.8% |
| Other values (9) | 80 |
| Value | Count | Frequency (%) |
| C | 35 | 10.1% |
| B | 34 | 9.9% |
| 2 | 31 | 9.0% |
| 1 | 29 | 8.4% |
| 6 | 24 | 7.0% |
| 5 | 22 | 6.4% |
| 9 | 20 | 5.8% |
| 3 | 20 | 5.8% |
| 4 | 20 | 5.8% |
| 8 | 19 | 5.5% |
| Other values (8) | 91 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 1 |
| Missing (%) | 0.0% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | S |
| 3rd row | S | S |
| 4th row | S | C |
| 5th row | C | C |
Common Values
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 80 | 17.9% |
| Q | 50 | 11.2% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 78 | 17.5% |
| Q | 35 | 7.8% |
| (Missing) | 1 | 0.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 316 | |
| c | 80 | 17.9% |
| q | 50 | 11.2% |
| Value | Count | Frequency (%) |
| s | 332 | |
| c | 78 | 17.5% |
| q | 35 | 7.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 80 | 17.9% |
| Q | 50 | 11.2% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 78 | 17.5% |
| Q | 35 | 7.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 80 | 17.9% |
| Q | 50 | 11.2% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 78 | 17.5% |
| Q | 35 | 7.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 80 | 17.9% |
| Q | 50 | 11.2% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 78 | 17.5% |
| Q | 35 | 7.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 80 | 17.9% |
| Q | 50 | 11.2% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 78 | 17.5% |
| Q | 35 | 7.9% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Interaction plot not present for dataset
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Interaction plot not present for dataset
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Interaction plot not present for dataset
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Interaction plot not present for dataset
Dataset B
Dataset A
Interaction plot not present for dataset
Dataset B
Dataset A
Interaction plot not present for dataset
Dataset B
Dataset A
Interaction plot not present for dataset
Dataset B
Dataset A
Interaction plot not present for dataset
Dataset B
Dataset A
Interaction plot not present for dataset
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.132 | 0.302 | 0.004 | 0.285 | 0.055 | -0.186 | 0.148 |
| Embarked | 0.000 | 1.000 | 0.138 | 0.074 | 0.024 | 0.238 | 0.091 | 0.073 | 0.129 |
| Fare | 0.132 | 0.138 | 1.000 | 0.221 | -0.051 | 0.450 | 0.148 | 0.469 | 0.203 |
| Parch | 0.302 | 0.074 | 0.221 | 1.000 | 0.000 | 0.000 | 0.221 | 0.312 | 0.125 |
| PassengerId | 0.004 | 0.024 | -0.051 | 0.000 | 1.000 | 0.039 | 0.081 | -0.062 | 0.091 |
| Pclass | 0.285 | 0.238 | 0.450 | 0.000 | 0.039 | 1.000 | 0.025 | 0.172 | 0.298 |
| Sex | 0.055 | 0.091 | 0.148 | 0.221 | 0.081 | 0.025 | 1.000 | 0.173 | 0.508 |
| SibSp | -0.186 | 0.073 | 0.469 | 0.312 | -0.062 | 0.172 | 0.173 | 1.000 | 0.148 |
| Survived | 0.148 | 0.129 | 0.203 | 0.125 | 0.091 | 0.298 | 0.508 | 0.148 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.113 | -0.249 | 0.085 | 0.217 | 0.078 | -0.173 | 0.227 |
| Embarked | 0.000 | 1.000 | 0.182 | 0.000 | 0.000 | 0.244 | 0.203 | 0.000 | 0.145 |
| Fare | 0.113 | 0.182 | 1.000 | 0.438 | -0.030 | 0.493 | 0.103 | 0.446 | 0.270 |
| Parch | -0.249 | 0.000 | 0.438 | 1.000 | -0.017 | 0.066 | 0.241 | 0.403 | 0.180 |
| PassengerId | 0.085 | 0.000 | -0.030 | -0.017 | 1.000 | 0.000 | 0.126 | -0.089 | 0.136 |
| Pclass | 0.217 | 0.244 | 0.493 | 0.066 | 0.000 | 1.000 | 0.000 | 0.137 | 0.279 |
| Sex | 0.078 | 0.203 | 0.103 | 0.241 | 0.126 | 0.000 | 1.000 | 0.184 | 0.535 |
| SibSp | -0.173 | 0.000 | 0.446 | 0.403 | -0.089 | 0.137 | 0.184 | 1.000 | 0.091 |
| Survived | 0.227 | 0.145 | 0.270 | 0.180 | 0.136 | 0.279 | 0.535 | 0.091 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 856 | 857 | 1 | 1 | Wick, Mrs. George Dennick (Mary Hitchcock) | female | 45.0 | 1 | 1 | 36928 | 164.8667 | NaN | S |
| 21 | 22 | 1 | 2 | Beesley, Mr. Lawrence | male | 34.0 | 0 | 0 | 248698 | 13.0000 | D56 | S |
| 725 | 726 | 0 | 3 | Oreskovic, Mr. Luka | male | 20.0 | 0 | 0 | 315094 | 8.6625 | NaN | S |
| 508 | 509 | 0 | 3 | Olsen, Mr. Henry Margido | male | 28.0 | 0 | 0 | C 4001 | 22.5250 | NaN | S |
| 531 | 532 | 0 | 3 | Toufik, Mr. Nakli | male | NaN | 0 | 0 | 2641 | 7.2292 | NaN | C |
| 854 | 855 | 0 | 2 | Carter, Mrs. Ernest Courtenay (Lilian Hughes) | female | 44.0 | 1 | 0 | 244252 | 26.0000 | NaN | S |
| 476 | 477 | 0 | 2 | Renouf, Mr. Peter Henry | male | 34.0 | 1 | 0 | 31027 | 21.0000 | NaN | S |
| 209 | 210 | 1 | 1 | Blank, Mr. Henry | male | 40.0 | 0 | 0 | 112277 | 31.0000 | A31 | C |
| 263 | 264 | 0 | 1 | Harrison, Mr. William | male | 40.0 | 0 | 0 | 112059 | 0.0000 | B94 | S |
| 532 | 533 | 0 | 3 | Elias, Mr. Joseph Jr | male | 17.0 | 1 | 1 | 2690 | 7.2292 | NaN | C |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 357 | 358 | 0 | 2 | Funk, Miss. Annie Clemmer | female | 38.0 | 0 | 0 | 237671 | 13.0000 | NaN | S |
| 739 | 740 | 0 | 3 | Nankoff, Mr. Minko | male | NaN | 0 | 0 | 349218 | 7.8958 | NaN | S |
| 183 | 184 | 1 | 2 | Becker, Master. Richard F | male | 1.0 | 2 | 1 | 230136 | 39.0000 | F4 | S |
| 311 | 312 | 1 | 1 | Ryerson, Miss. Emily Borie | female | 18.0 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C |
| 380 | 381 | 1 | 1 | Bidois, Miss. Rosalie | female | 42.0 | 0 | 0 | PC 17757 | 227.5250 | NaN | C |
| 144 | 145 | 0 | 2 | Andrew, Mr. Edgardo Samuel | male | 18.0 | 0 | 0 | 231945 | 11.5000 | NaN | S |
| 568 | 569 | 0 | 3 | Doharr, Mr. Tannous | male | NaN | 0 | 0 | 2686 | 7.2292 | NaN | C |
| 367 | 368 | 1 | 3 | Moussa, Mrs. (Mantoura Boulos) | female | NaN | 0 | 0 | 2626 | 7.2292 | NaN | C |
| 135 | 136 | 0 | 2 | Richard, Mr. Emile | male | 23.0 | 0 | 0 | SC/PARIS 2133 | 15.0458 | NaN | C |
| 32 | 33 | 1 | 3 | Glynn, Miss. Mary Agatha | female | NaN | 0 | 0 | 335677 | 7.7500 | NaN | Q |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 413 | 414 | 0 | 2 | Cunningham, Mr. Alfred Fleming | male | NaN | 0 | 0 | 239853 | 0.0000 | NaN | S |
| 392 | 393 | 0 | 3 | Gustafsson, Mr. Johan Birger | male | 28.0 | 2 | 0 | 3101277 | 7.9250 | NaN | S |
| 442 | 443 | 0 | 3 | Petterson, Mr. Johan Emil | male | 25.0 | 1 | 0 | 347076 | 7.7750 | NaN | S |
| 816 | 817 | 0 | 3 | Heininen, Miss. Wendla Maria | female | 23.0 | 0 | 0 | STON/O2. 3101290 | 7.9250 | NaN | S |
| 397 | 398 | 0 | 2 | McKane, Mr. Peter David | male | 46.0 | 0 | 0 | 28403 | 26.0000 | NaN | S |
| 102 | 103 | 0 | 1 | White, Mr. Richard Frasar | male | 21.0 | 0 | 1 | 35281 | 77.2875 | D26 | S |
| 884 | 885 | 0 | 3 | Sutehall, Mr. Henry Jr | male | 25.0 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | NaN | S |
| 280 | 281 | 0 | 3 | Duane, Mr. Frank | male | 65.0 | 0 | 0 | 336439 | 7.7500 | NaN | Q |
| 456 | 457 | 0 | 1 | Millet, Mr. Francis Davis | male | 65.0 | 0 | 0 | 13509 | 26.5500 | E38 | S |
| 479 | 480 | 1 | 3 | Hirvonen, Miss. Hildur E | female | 2.0 | 0 | 1 | 3101298 | 12.2875 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 454 | 455 | 0 | 3 | Peduzzi, Mr. Joseph | male | NaN | 0 | 0 | A/5 2817 | 8.0500 | NaN | S |
| 149 | 150 | 0 | 2 | Byles, Rev. Thomas Roussel Davids | male | 42.0 | 0 | 0 | 244310 | 13.0000 | NaN | S |
| 198 | 199 | 1 | 3 | Madigan, Miss. Margaret "Maggie" | female | NaN | 0 | 0 | 370370 | 7.7500 | NaN | Q |
| 230 | 231 | 1 | 1 | Harris, Mrs. Henry Birkhardt (Irene Wallach) | female | 35.0 | 1 | 0 | 36973 | 83.4750 | C83 | S |
| 324 | 325 | 0 | 3 | Sage, Mr. George John Jr | male | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
| 248 | 249 | 1 | 1 | Beckwith, Mr. Richard Leonard | male | 37.0 | 1 | 1 | 11751 | 52.5542 | D35 | S |
| 381 | 382 | 1 | 3 | Nakid, Miss. Maria ("Mary") | female | 1.0 | 0 | 2 | 2653 | 15.7417 | NaN | C |
| 539 | 540 | 1 | 1 | Frolicher, Miss. Hedwig Margaritha | female | 22.0 | 0 | 2 | 13568 | 49.5000 | B39 | C |
| 118 | 119 | 0 | 1 | Baxter, Mr. Quigg Edmond | male | 24.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C |
| 194 | 195 | 1 | 1 | Brown, Mrs. James Joseph (Margaret Tobin) | female | 44.0 | 0 | 0 | PC 17610 | 27.7208 | B4 | C |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||